Search Result

Select

Imbalanced data classification method based on Lasso and constructive covering algorithm

Yi JIANG, Shuping WU, Kun HU, Linbo LONG

Journal of Computer Applications 2023, 43 (4): 1086-1093. DOI: 10.11772/j.issn.1001-9081.2022040490

Abstract （238）

HTML （7）

PDF （1003KB）（125）

Save

Aiming at the problem that the machine learning classification algorithms have insufficient ability to identify minority samples in the imbalanced data classification problems， an imbalanced data classification method L-CCSmote （Least absolute shrinkage and selection operator Constructive Covering Synthetic minority oversampling technique） was proposed by taking the telecom customer churn scenario as an example. Firstly， the churn costumer related features were extracted through Lasso （Least absolute shrinkage and selection operator） to optimize the model input. Then， a neural network was built through Constructive Covering Algorithm （CCA） to generate coverages that conformed to the overall distribution of samples. Finally， a single-sample coverage strategy， a sample diversity strategy and a sample density peak strategy were further proposed to perform a hybrid sampling to balance the data. Total of 13 imbalanced datasets and 2 desensitized telecom customer datasets were selected from KEEL data base， and the proposed method was verified on Logistic Regression （LR） and Support Vector Machine （SVM） classification algorithms respectively. On LR classification algorithm， compared with the Synthetic Minority Oversampling TEchnique Edited nearest neighbor （SMOTE-Enn）， the proposed method had the average Geometric MEAN （G-MEAN） increased by 2.32%. On SVM classification algorithm， compared with the Borderline-SMOTE （Borderline Synthetic Minority Oversampling Technique）， the proposed method had the average G-MEAN increased by 2.44%. Experimental results show that the proposed method can solve the influence of class skew distribution on classification， and its recognition ability for rare classes is better than that of the classical balanced data classification methods.

Table and Figures | Reference | Related Articles | Metrics

Select

Cross-regional order allocation strategy for ride-hailing under tight transport capacity

Yu XIA, Junwu ZHU, Yi JIANG, Xin GAO, Maosheng SUN

Journal of Computer Applications 2022, 42 (6): 1776-1781. DOI: 10.11772/j.issn.1001-9081.2021091627

Abstract （358）

HTML （5）

PDF （1163KB）（63）

Save

In the ride-hailing platform， matching is a core function，and the platform needs to increase the number of matched orders as much as possible. However， the demand distribution of ride-hailing is usually extremely uneven， and the starting points or end points of orders show the characteristic of high concentration in some time periods. Therefore， an incentive mechanism with early warning was proposed to encourage drivers to take orders across regions， thus achieving the purpose of rebalancing the platform cross-regional transport capacity. The order information was analyzed and processed in this strategy， and an early warning mechanism of transport capacity in adjacent regions was established. To reduce the number of unmatched orders in the region during the period of tight transport capacity and improve the platform utility and passenger satisfaction， drivers in adjacent regions were encouraged to accept cross-regional orders when regional transport capacity was tight. Experimental results on instances show that the proposed rebalancing mechanism improves the average utility by 15% and 38% compared with Greedy and Surge mechanisms， indicating that the cross-regional transport capacity rebalancing mechanism can improve the platform revenue and driver utility， rebalance the supply-demand relationship between regions to a certain extent， and provide a reference for the ride-hailing platform to balance the supply-demand relationship macroscopically.

Table and Figures | Reference | Related Articles | Metrics

Select

Improved BIRCH clustering algorithm

shen-yi Jiang 李霞 Li Xia

Journal of Computer Applications

Abstract （1830）

PDF （575KB）（2388）

Save

BIRCH algorithm is a clustering algorithm suitable for very large data sets. In the algorithm, a CF-tree is built whose all entries in each leaf node must satisfy a uniform threshold T, and the CF-tree is rebuilt at each stage by different threshold. But how to set the initial threshold and how to increase the threshold of each stage are not given. In addition, the algorithm can only work with "metric" attribute, which makes its application restrained. This paper made some improvements on BIRCH algorithm: 1) Changed CF structure so that heterogeneous attributes could be manipulated; 2) Gave a heuristic method of getting initial threshold and increasing threshold of second stage of the algorithm; 3) Discussed the algorithm's parameter B and L and found that the algorithm had equal performance when B=L, at last, gave a sound scope for B. Experimental results on public data sets show that the improved algorithm has preferable performance.